36 research outputs found

    Emerging multidisciplinary research across database management systems

    Get PDF
    The database community is exploring more and more multidisciplinary avenues: Data semantics overlaps with ontology management; reasoning tasks venture into the domain of artificial intelligence; and data stream management and information retrieval shake hands, e.g., when processing Web click-streams. These new research avenues become evident, for example, in the topics that doctoral students choose for their dissertations. This paper surveys the emerging multidisciplinary research by doctoral students in database systems and related areas. It is based on the PIKM 2010, which is the 3rd Ph.D. workshop at the International Conference on Information and Knowledge Management (CIKM). The topics addressed include ontology development, data streams, natural language processing, medical databases, green energy, cloud computing, and exploratory search. In addition to core ideas from the workshop, we list some open research questions in these multidisciplinary areas

    Emerging multidisciplinary research across database management systems

    Get PDF
    The database community is exploring more and more multidisciplinary avenues: Data semantics overlaps with ontology management; reasoning tasks venture into the domain of artificial intelligence; and data stream management and information retrieval shake hands, e.g., when processing Web click-streams. These new research avenues become evident, for example, in the topics that doctoral students choose for their dissertations. This paper surveys the emerging multidisciplinary research by doctoral students in database systems and related areas. It is based on the PIKM 2010, which is the 3rd Ph.D. workshop at the International Conference on Information and Knowledge Management (CIKM). The topics addressed include ontology development, data streams, natural language processing, medical databases, green energy, cloud computing, and exploratory search. In addition to core ideas from the workshop, we list some open research questions in these multidisciplinary areas

    Quality-Driven Disorder Handling for M-way Sliding Window Stream Joins

    Full text link
    Sliding window join is one of the most important operators for stream applications. To produce high quality join results, a stream processing system must deal with the ubiquitous disorder within input streams which is caused by network delay, asynchronous source clocks, etc. Disorder handling involves an inevitable tradeoff between the latency and the quality of produced join results. To meet different requirements of stream applications, it is desirable to provide a user-configurable result-latency vs. result-quality tradeoff. Existing disorder handling approaches either do not provide such configurability, or support only user-specified latency constraints. In this work, we advocate the idea of quality-driven disorder handling, and propose a buffer-based disorder handling approach for sliding window joins, which minimizes sizes of input-sorting buffers, thus the result latency, while respecting user-specified result-quality requirements. The core of our approach is an analytical model which directly captures the relationship between sizes of input buffers and the produced result quality. Our approach is generic. It supports m-way sliding window joins with arbitrary join conditions. Experiments on real-world and synthetic datasets show that, compared to the state of the art, our approach can reduce the result latency incurred by disorder handling by up to 95% while providing the same level of result quality.Comment: 12 pages, 11 figures, IEEE ICDE 201

    FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory

    Get PDF
    The advent of Storage Class Memory (SCM) is driving a rethink of storage systems towards a single-level architecture where memory and storage are merged. In this context, several works have investigated how to design persistent trees in SCM as a fundamental building block for these novel systems. However, these trees are significantly slower than DRAM-based counterparts since trees are latency-sensitive and SCM exhibits higher latencies than DRAM. In this paper we propose a novel hybrid SCM-DRAM persistent and concurrent B-Tree, named Fingerprinting Persistent Tree (FPTree) that achieves similar performance to DRAM-based counterparts. In this novel design, leaf nodes are persisted in SCM while inner nodes are placed in DRAM and rebuilt upon recovery. The FPTree uses Fingerprinting, a technique that limits the expected number of in-leaf probed keys to one. In addition, we propose a hybrid concurrency scheme for the FPTree that is partially based on Hardware Transactional Memory. We conduct a thorough performance evaluation and show that the FPTree outperforms state-of-the-art persistent trees with different SCM latencies by up to a factor of 8.2. Moreover, we show that the FPTree scales very well on a machine with 88 logical cores. Finally, we integrate the evaluated trees in memcached and a prototype database. We show that the FPTree incurs an almost negligible performance overhead over using fully transient data structures, while significantly outperforming other persistent trees

    SPA: Economical and workload-driven indexing for data analytics in the cloud

    Get PDF
    Selective queries are not uncommon in large-scale data analytics, for example, when drilling down into a specific customer in a dashboard. Traditionally, selective queries are accelerated by creating secondary indexes. However, because of their large size, expensive maintenance, and difficulty to tune and automate, indexes are typically not used in modern cloud data warehouses or data lakes. Instead, such systems rely mostly on full table scans and lightweight optimizations like min/max filtering, whose effectiveness depends heavily on the data layout and value distributions.We propose SPA as the vision for automatically optimizing selective queries for immutable copy-on-write data formats. SPA adaptively indexes subsets of the data in an incremental and workload-driven manner. It makes fine-grained decisions and continuously monitors their benefit, dynamically allocating an optimization budget in a way that bounds the additional cost of indexing. Furthermore, it guarantees a performance improvement in the cases where indexes - potentially partial ones - prove to be beneficial. When indexes lose their benefit due to a shifting workload, they are gradually deconstructed in favor of optimizations that accommodate recent trends. As SPA does not require information about updates performed on the data, it can also be employed as an accelerator for systems that do not control the data, e.g., in cloud data lakes

    View evolution support for information integration systems over dynamic distributed information spaces.

    Full text link
    Challenging issues for creating and maintaining tailored information gathering systems over large-scale information spaces (e.g., Digital Libraries, the World Wide Web) include the diversity of the information sources (ISs) in terms of their structures, query interfaces and search engines, as well as the dynamics of sources continuously being added, removed or upgraded. Current information integration systems are often based on static apriori defined views that gather information from heterogeneous information sources and provide the user with a uniform view of the information space for browsing and querying. This dissertation addresses one of the largely unexplored issues that such information integration systems raise, namely, the evolution and maintenance of data warehouses when the underlying information sources change their capabilities, i.e., schema level changes. The overall solution approach that this dissertation puts forth consists of defining the problem of view evolution triggered by capability changes of ISs and designing evolution algorithms that achieve synchronization of the affected views in the presence of these types of changes. The contributions made by this dissertation include (1) an extension of the SQL view definition language that allows the user to apriori specify evolution preferences, e.g., whether dropping or changing a view component is acceptable; (2) a formal definition of what is a legal view rewriting under capability changes, i.e., semantics of view evolution; (3) algorithms for view synchronization that find a modified view definition in response to a capability change of an IS; (4) algorithms for the maintenance of materialized views after the view synchronization process; (5) experimental evaluations comparing the maintenance strategies after view synchronization with alternative maintenance techniques; and (6) the development of an working system incorporating some of the proposed evolution algorithms.Ph.D.Applied SciencesComputer scienceUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/131740/2/9929909.pd

    Using Complex Substitution Strategies for View Synchronization

    No full text
    Abstract Large-scale information systems typically contain autonomous information sources (ISs) that dynamically modify their content, interfaces as well as their query services regardless of the data warehouses (views) that are built on top of them. Current view technology fails to provide adaptation techniques for such changes giving support to only static views in the sense that views become undefined when ISs undergo capability changes. We propose to address this new view evolution problem - which we call view synchronization - by allowing view definitions to be dynamically evolved when they become undefined. The foundations of our approach to view synchronization include: the EvolvableSQL view definition language (E-SQL), the model for information source description (MISD), and the concept of legal view rewritings. In this paper, we now introduce the concept of the strongest synch-equivalent view definition that explicitly defines the evolution semantics associated with an E-SQL ..

    PIKM 2010ACM Workshop for Ph.D. Students in Information and Knowledge Management

    No full text
    The PIKM workshop focuses on papers consisting mainly of the Ph.D. dissertation proposals of doctoral students. A wide range of topics on any area in databases, information retrieval and knowledge management are presented at this workshop. The areas of interest are similar to those at the CIKM main conference in the three respective tracks. Interdisciplinary work across these tracks is encouraged

    Real-Time Networking over HIPPI

    No full text
    HIPPI provides a very-high-speed communication medium, which is very well suited for a large number of bandwidth-demanding distributed applications. Unfortunately, its circuit-switched nature makes it very difficult to provide real-time guarantees when connections contend for network resources. We present a time-division-multiplex access scheme designed to give timing guarantees to high-speed connections. We describe the problem of scheduling the access to a HIPPI network, and show that, although the problem is very unlikely to be computationally tractable, very simple heuristics give high network utilizations for moderately-sized networks. We present the RMP/RMCP protocol, our implementation of the scheme described in this paper on the XUNET-West HIPPI testbed. 1 Introduction A large number of applications in distributed control, distributed virtual reality, and remote laboratoring demand for hard delay guarantees in order to satisfy the timing requirements of their time-critical com..
    corecore